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Abstract. A complete solution for an inverse problem needs five main steps: choice of basis func- 
tions for discretization, determination of the order of the model, estimation of the hyperparameters, 
estimation of the solution, and finally, characterization of the proposed solution. Many works have 
been done for the three last steps. The first two have been neglected for a while, in part due to the 
complexity of the problem. However, in many inverse problems, particularly when the number of 
data is very low, a good choice of the basis functions and a good selection of the order become 
primary. In this paper, we first propose a complete solution within a Bayesian framework. Then, we 
apply the proposed method to an inverse elastic electron scattering problem. 



INTRODUCTION 



In a very general linear inverse problem, the relation between the data y = ■ ■ ■ , ?/„ 
and the unknown function /(.) is 

yi= Jhi{r)f{r)dr, i = l,---,m, (1) 

where hi{r) is the system response for the data yi. We assume here that the hi{r) are 
known perfectly. The first step for any numerical processing is the choice of a basis 
function bj{r) and an order k, in such a way to be able to write 

fir) = j2x,b,{r). (2) 



This leads to 



y = Ax + e (3) 
with2;= [yi,---,ymY,x = [xi, ■ ■ ■ ,0;^]* and 

Aij=jh^{r)bj{r)dr, i = 1, ■ ■ ■ ,m, j = 1, ■ ■ ■ , A;, (4) 

where e = [ei, ■ • ■ , e™]* represents the errors (both the measurement noise and the mod- 
eling and the approximation related to the numerical computation of matrix elements 
Aij). Even when the choice of the basis functions bi(r) and the model order k is fixed, 
obtaining a good estimate for x needs other assumptions about the noise e and about 



X itself. The Bayesian approach provides a coherent and complete framework to handle 
the random nature of e and the a priori incomplete knowledge of x. 

The first step in a Bayesian approach is to assign the prior probability laws 
p{y\x,4>,k,l) = pe{y — Ax\4>,k,l), p{x\ip,k,l), p{4>\k,l) and p{tp\k,l), where 
Pe{y — Ax\(f), k, I) is the probability law of the noise, and {(f), the hyperparameters of 
the problem. Note that x represents the unknown parameters, k = dim(a3) is the order of 
the model, m = dim(t/) is the number of the data and / is an index to a particular choice 
of basis functions. Note that the elements of the matrix A depend on the choice of the 
basis functions. However, to simplify the notations, we do not write this dependence 
explicitly. We assume that we have to select one set / of basis functions among a finite 
set (indexed by [1 : Imax]) of them. Thus, for a given / e [1,/rreaa;] and a given model 
order A; G [1, kmax], and using the mentioned prior laws, we define the joint probability 
law 

p{y,x,(f>,il)\k,l) ^p{y\x,(f),k,l)p{x\'ilj,k,l)p{(f)\k,l)p{ijj\k,l). (5) 

From this probability law, we obtain, either by integration or by summation, any 
marginal law, and any a posteriori probability law using the Bayes rule. 
What we propose in this paper is to consider the following problems: 

• Parameter estimation: 

x^argmax{p{x\y,^,^jj,k,T)} , (6) 



X 



where 



and 



p{x\y,4>,tjj,k,l) ^p{y,x\4>,'il),k,l)/p{y\4>,ij},k,l), (7) 
p{y,x\(j),'ijj,k,l) ^p{y\x,(f),k,l)p{x\ip,k,l) (8) 



p(?/|</),'0,A;,O ^Jp{y,x\(j),'ip,kJ)dx. (9) 
Hyperparameter estimation: 

(^,-0) = argmax{p((/),i/'|2/,fc,/)}, (10) 
(</>-V') 

where 

p{4>,tj}\y,k,l) ^p{y,(l>,il)\k,l)/p{y\k,l) (11) 

and 

piy \k,l) = ffpiy, 4>,^\k, I) d(l>d^. (12) 



Model order selection: 

k — argma.x^p(k\y,V)^ , (13) 

where 

p{k\y,l)^p{y\k,l)p{k)/p{y\l) (14) 

and 

p{y\l)=Y.PiyW)p{k). (15) 
fe=i 



• Basis function selection: 

r=argmax{p(Z|?/)}, (16) 

I 

where 

p{i\y)-p{y\i)p{i)/p{y) (iv) 

and 



max 



p{y)-J2p(y\i)p(i)- (18) 



1=1 



• Joint parameter, hyperparameter, model order and basis function estimation: 

(x,4),'ii),k,l) = argmax {p{y,x,(j),ijj\k,l)p(k)p(l)}. (19) 

{X,(t),1p,k,l) 

As it can be easily seen, the first problem is, in general, a well posed problem and the 
solution can be computed, either analytically or numerically. The others (except the last) 
need integrations. These integrals can be done analytically only in the case of Gaussian 
laws. In other cases, one can either use a numerical integration (either deterministic or 
stochastic) or to resort to approximations such as the Laplace method which allows us 
to obtain a closed-form expression for the optimality criterion. 

Here, we consider these problems for the particular case of Gaussian prior laws: 



p{y\x,(f),k,l) = =(27r/0)-"^/'exp 



p{x\'iP,k,l) = AA^O,^/j =(27r/V')"''/'exp 



1 / II 112' 

— lb \\x\\ 



(20) 
(21) 



where ^ and ^ are respectively the variance of the noise and the parameters. 



-^(/)||2/-Aa;|p-^V||ic||^ 



(22) 



PARAMETER ESTIMATION 

First note that in this special case we have 

p{y, x 1 0, V, k, I) = (27r/(/.)-"*/' (27r/V')"'/' exp 
Integration with respect to x can be done analytically and we have: 

p{y\(l),'ilj,k,l)^Jp{y,x\(l),jlj,k,l)dx^X(0,Py), (23) 

with 

Py = ^AA' + V = ^{AA' + XI) and A = ^. (24) 
It is then easy to see that the a posteriori law of x is also Gaussian: 



p{ 



x\y,(i),ij,k,l)=M{x,P) with P = ^(A*A + A/)"^ and x = <pPA^y. (25) 



Thus the parameter estimation in this case is straightforward: 

x = argmax{p(a; 1 1/, 0,-0,/^,/)} = argmin{ Ji(a;)} , (26) 

X X 

with 

Ji{x)^\\y-Ax\\^ + X\\x\\'', (27) 

which is a quadratic function of x. The solution is then a linear function of the data y 
and is given by 

x^K{X)y with K{\)^{A^A + XI)-^AK (28) 



HYPERPARAMETER ESTIMATION 



For the hyperparameter estimation problem we note that: 



p{y\k,l) 



p{y\kj) 

Thus, the hyperparameter estimation problem becomes 



1 



^i^MzW^(27r)-/^|P,rV^exp --y^P-'y 



2" 



(29) 



J,^) = argmaxlp{(f),i/j\y,k,T)\ = argmin{ Jsl^,^)} (30) 



where 



J2 (0, V) = - lnp(0 \k,l)- lnp(V' I A;, + - In I P, I + - y'Py'y. (31) 

Unfortunately, in general, there is not an analytical expression for the solution, but this 
optimization can be done numerically. Many works have been investigated to perform 
this optimization appropriately for particular choices of p(0 1 k, I) and p{ip \k,l). Among 
the others, we may note the choice of improper prior laws such as Jeffreys' prior 
p{(f) \ k,l) (X ^ and p{ip \k,l) (x ^ or proper uniform prior laws p{(f)\k,l) = ^ — 

and p{^jJ \k,l) = — or still the proper Gamma prior laws. 

One main issue with improper prior laws is the existence of the solution, because 
p{(j),ip I y,k,l) may not even have a maximum or its maximum can be located at the 
border of the domain of variation of (0, i/j). Here, we propose to use the following proper 
Gamma priors : 

p((/.) = e?(ai,A)oc0("^-i)exp[-/3i0]^E{(i)} = ai//3i (32) 
pW = 6?(a2,/9)ocV^("^-^)exp[-/32V']^E{V^} = a2//92. (33) 

With these priors, we have 

J2(0, V') = (1 - ai) ln</. + (1 - a2) In + + /92V' + ^ In | Pj,| + ^ v'Py^V (34) 



The second main issue is the numerical optimization. Many works have been done 
on this subject. Among the others we can mention those who try to integrate out one 
of the two parameters directly or after some transformation. For example transforming 
— > (0, A) and using the identities 



AA* + XI 



A 



m—k 



A^A + \I 



and 

we have 
and 



{AA' + XI)-'^-{I-AK{X)), 



InjPj^l = — mln0 — /cln A + ln 



A*A + AJ 



y'Py 'y = (l>y\l - AK{\))y = ((>y\y - Ax) = (t>y\y - y). 
Then, we obtain 



[l-ai- 



m—k ' 



ln0 + (1 - as - I) In V' + + /?2V' 



+iln A'A+^I 



or 



J2(0,A) = (2-ai-a2-f )ln0+(l-a2-|)lnA + /3i0 + /320A 



+ iln 



A'A + \I 



(35) 

(36) 

(37) 
(38) 

(39) 
(40) 



For fixed A, equating to zero the derivative of this expression with respect to gives an 
explicit solution which is 



9^2(0, A) 

d(j) 







,m 



+ ai + a2-2) / 



/3i + \/32 + ^y\y-y) 



(41) 



Putting this expression into J2 we obtain a criterion depending only on A which can be 
optimized numerically. In addition, it is possible to integrate out to obtain p{X\y, k, I), 
but the expression is too complex to write. 



JOINT ESTIMATION 

One may try to estimate all the unknowns simultaneously by 

(x, I) ^ argmsix {p{x, (I), 'ijj,k,l\y} = argmin {J3(x,(f),'ijj, k, I)} , (42) 

{x,(i>,^,k,i) {x,4>,ip,k,i) 

where 

J^{x,(t),^,k,l) = -lnp(A;)-lnp(0-(f + o;i-l)ln0 

-(I + o;2 - 1) InV' + (a + - Aa;||2) + (/32 + . 

(43) 



The main advantage of this criterion is that we obtain explicit solutions for x, and 
by equating to zero the derivatives of J2,{x, 0, V', k, I) with respect to them: 



x = {A*A + \I)-^A^y, with A = 0/V'; 
= (f + ai - 1)/ (a + 1 112/ - Axf) ■ (44) 

VJ=(| + a2-l)/(/32 + |||S|P). 

We cannot obtain closed form expressions for k and / which depend on the particular 
choice for andp(/). These relations suggest an iterative algorithm such as: 



"max 

max 



Joint MAP estimation algorithm 1 

for Z = 1 : 

for A; = 1 : k,n 

compute the elements of the matrix A; 
initialize A = Aq; 
repeat until convergency: 

x = {A'A + XI)-^A'y- 

^=(| + a2-l)/(/?2 + iW) -^^-<t>/i^ 
end 

compute J{k,l) — J3($,0,'0, 
end 
end 

choose the best model and the best order by 

(f,^) =argmin(^^;){J(A;,0} 



Note however that, for fixed x, and ijj, the criteria J3 in (43) or J5 in (47) are mainly 
linear functions of k if we choose a uniform law for p{k). This means that we may not 
have a minimum for these criteria as a function of k. The choice of the prior p( A;) is then 
important. One possible choice is the following: 

/c > kjjiax 

which is a decreasing function of k in the range k E [l.kmax] and zero elsewhere. This 
choice may insure the existence of a minimum if kmax is chosen appropriately. For 
we propose to choose a uniform law, because we do not want to give any favor to any 
model. 

Another algorithm can be obtained if we replace the expression of x into J3 to obtain 
a criterion depending only on (0, V'): 

Ji{(t>,il^,k,l)^ -lnp(A;)-lnp(0-(f + «i-l)ln0-(| + «2-l)lnV' 

+0 (A + i 111/ - y (A) IP) + V' (/?2 + i lis (A) IP) ^"^^^ 



or on (0, A): 

J5((/), A, k, I) = -lnp(fc) - lnp(0 - + ai + - 2) ln0 - (| + as - 1) In A 

|2~ 



+0 (a + - j/(A)|p) + (A0) + |||2(A)|| 

(47) 

and then optimize it with respect to them. In the second case, we can again obtain first 
(f) and put its expression 



'm + k \ . 

— h «! + a2 - 2 / 



{(ii+\\\y-m\?)+m+\\\mf) 



(48) 



in the criterion to obtain another criterion depending only on A and optimize it numeri- 
cally. This gives the following algorithm: 



Joint MAP estimation algorithm 2 

for / 1 . Iraax 

for k — \ '. kjnax 

compute the elements of the matrix A; 

for A e lOl-^^i^^l 

compute X = (A* A + \I)~^A^y and y = Ax 

compute (p using (eq. 48) 

compute J(A) = J^{(t),\,k,l) (eq. 47) 

end ^ 

choose A = argmin;^ {-^(A)} 
compute X = (A* A + \I)~^A^y; 
compute (p using (eq. 48); 
compute J{k,l) = J5{(j),\k,l) (eq. 47) 
end 
end 

choose the best model and the best order by 

(^) = argmin(fc^;){J(A;,/)} 



MODEL ORDER SELECTION 

The model order selection 

k = argmax{p(A; | y,l)} = argmin{ J6(A;)} , (49) 

k k 

with 

Je(k)^-\np(k)-\np(y\k,l), (50) 

needs one more integration 

p{y\k,l)^ I p{y,ct>Mk.l)m'4^. (51) 



or 

Piy\k,l)^ I p{y,^,X\k,l)^H\ (52) 

where p(?/,0,A|A;,/)oc exp [— J2 (0, A) ] given by (40). As we mentioned in the preceeding 
section, these integrations can only be down numerically. A good approximation can be 
obtained using the following: 

p{y\k,l) = j j p{y,(t),'il]\k,l)A(t)d.'iljc^Y.Hpiy\^v^i^k,l), (53) 

i 3 

where and [ipi] are samples generated using the prior laws and 



BEST BASIS OR MODEL SELECTION 

The model selection 

I — argmax{p(/ 1 y)} — argmin{J7(Z)} (54) 

I I 

with 

Ml)^-\np{l)-\np{y\l) (55) 

does not need any more integration, but only one summation. Choosing p{l) uniform 
and making the same previous approximations we have 

Ml) ^ -In J^p{y\k,l)p{k). (56) 

k=l 



PROPOSED ALGORITHMS 

Based on equations (55), (53), (50), (39) and (40), we propose the following algorithm: 



Marginal MAP estimation algorithm 2 



Generate a set of samples {(pj} drawn from 
Generate a set of samples {ipi} drawn fromp(^) 

for / 1 . Ijnax 

for k = 1 '. kjnax 

compute the elements of the matrix A; 

for (f) e 

for ^ e {ipi} 

compute A = 4>/ip, x = {A^A + XI)~^A^y and y = Ax 
compute p^{i,j,k, I) = exp [- J2(0j, V'?)] (eq. 39) 
end 

normalize J,/.-,/) = p^{ij,k,l) / T,iP^{iJ:k,l) 
compute p^{j,k, I) = EiP^{i,j,k,l) 



normalize p^{j, k,l) = p^{j,k,l) / J2jP4>U, 
compute pk{k, I) = T,jP4,{j, k, I) 
end 

normalize pk{k, I) = pk{k, I) / EkPkik, I) 
compute = J2kPk{k,l) 



compute the elements of the matrix Aforl — I and k — k 
compute X = {A^A + XI)'~^A^y. 



APPLICATION: ELECTRON SCATTERING DATA INVERSION 



Elastic electron scattering provides a means of determining the charge density of a nu- 
cleus, p(r), from the experimentally determined charge form factor, F{q). The connec- 
tion between the charge density and the cross section is well understood and in plane 
wave Bom approximation F{q) is just the Fourier transform of p{r), which for the case 



end 



end 



normalize Pi (0 = Pi{l) / EiP{l) 

choose the best model by I — argmax^ {pi{l)} 




:j{p^{jXk)} 
■i {pA'^,jXk)} 



of even-even nuclei, which we shall consider, is simply given by 

roc 

F(q) = A7r r"^ Jo(qr)p(r)dr, (57) 

^0 

where Jq is the spherical Bessel function of zero order and q is the absolute value of the 
three momentum transfer. 

We applied the proposed method with the following usual discretization procedure: 

,(,)^|E,t,-,f.iW (58) 



which results in 



and 



F{q)^A7r^Xj [ V (gr) 6,- (r) dr (59) 
j=l •'0 



y = Ax + e, (60) 

where a; is a vector containing the coefficients {xj,j = !,■■■, A;},?/ is a vector containing 
the form factor data {F{qi),i = l,---,m} and A an {m x k) matrix containing the 
coefficients Ai^j given by 

A,,=47r rV2jo(gir)6,(r)dr. (61) 

J 

To compute Ai^ we define a discretization step Ar = Rc/N, a vector r — {r„ = 
{n — 1) Ar, n = 1, • • • , A^}, a {N x k) matrix B with elements Bnj — bj (r„), a (mx N) 
matrix C with elements Cj.„ = (47rAr)r^Jo(gjr„) such that we have A = CB. Note 
also that when the vector x is determined, we can compute p = {p(r„), n= 1, • • • , N} 
hy p — Bx. 

To test the proposed methods, we used the following simulation procedure: 

• Select a model type I and an order k and generate the matrices B, C and A, and 
for a random set of parameters x generate the data y — Ax. 

• Add some noise eonyio obtain y — Ax-\-e. 

• Compute the estimates l,k,x,y — Ax and p — Bx and compare them with /, k, 
x,y = Ax and p = Bx. 

We chose the following basis functions: 

•1=1: bj{r) = J(i{qjr) — This is a natural choice due to the integral kernel and 

the orthogonality property of the Bessel functions. 
•1 = 2: bj{r) = smc{qjr) — This choice is also natural due to the orthogonality 

and the limited support hypothesis for the function p(r). 

•1 = 3: bj(r) = exp —^{qjrY — This choice can account for the positivity of 
the function p{r) if the {xj} are constrained to be positive. 

•1 = 4:: bj{r) = exp — |(?jr)^ JoiQj"^) — This choice combines the first and the 
third properties. 



'1 = 5: hj{r) = 1/ (cosh(gjr)) — This choice has the same properties as the third 
one. 

• / = 6 : 6j(r) = 1/(1 + {qjrY) — This choice has the same properties as the third 
one. 

In all these experiments we chose A; = 6, m = 20, = 100, Rc = 8 and = iir/Rc. 
The following figures show typical solutions. Figures 1 and 2 show the details of the 
procedure for the case 1 = 1. Figures 3, 4 and 5 show the results for the cases / = 1 to 
1 = 6. 
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Fig. 1: a) basis functions bj{r), b) p{r), c) data F{qi) in a logarithmic scale. 



model number 



model order 



p{Kl) 



0.8 



0.6 



0.4 



0.2 




1 2 3 4 5 6 
model number 



0.1 r 
0.08 
0.06 
0.04 
0.02 




1 2 3 4 5 6 78 910I11213I4I5 
model order 



p{l) andp(fc|/) 




i=i 

and 

k 





k 




F{qi) = J2AjX 
i=i 


• 

• 


and 

k 


• • 





Fig. 2: a) p{k,l\y), p{l\y) and p{k\y, I), b) original p(r) and estimated p(r), 
c) original F{qi) and estimated F{qi). 




Fig. 3: Left: / = 1 Right: / = 2 

a) basis functions bj{r), b) p{k\y) and p(k\y, I), c) p(r) and p(r), 

d) F{qi) and F{qi). 




Fig. 4: Left: / = 3 Right: I = 4 

a) basis functions bj{r), b) p{k\y) and p{k\y, I), c) p(r) and p(r), 

d) F{qi) and F{qi). 




Fig. 5: Left: / = 5 Right: I = 6 

a) basis functions bj{r), b) p{k\y) and p{k\y, I), c) p(r) and p(r), 

d) F{qi) and F{qi). 



Note that in these tests, we know perfectly the model and generated the data according 
to our hypothesis. To test the method in a more realistic case, we choose a model for 
which we can have an exact analytic expression for the integrals. For example, if we 
choose a symmetric Fermi distribution [4] 



p(r) 



a 



cosh{R/d) 



cosh(i?/ d) + cosh(r/(i) 



(62) 



an analytical expression for the corresponding charge form factor can easily be ob- 
tained [5]: 



F{q) 



An'^ad cosh.{R/d) 



q smh{R/d) 



Rcos{qR) TT d sm(q R) cosh{Tr qd) 



smh.{7r qd) 



smli^ (nqd) 



(63) 



Only two of the parameters a, R and d are independent since the charge density must 
fulfill the normalization condition 



477 J r"^ p{r)dr = Z. 



(64) 



Figure 6 shows the theoretical charge density p(r) of ^^C (Z=6) obtained from (62) 
for r G [0, 8] fm with R= 1.1 A and d = 0.626 fm and the theoretical charge form factor 
F{q) obtained by (63) for q E [0,8] fm^^ and the 15 simulated data: 

qr= [0.001, .5, 1.0, 1.5, 2.0,2.5, 3.0,3.5, 4.0,4.5, 5.0,5.5, 6.0,6.5, 7.0] fm"^ 

which are used as inputs to the inversion method. 





Momentum transfer 



Fig. 6: Theoretical charge density p(r), charge form factor log \F{q) \ and the data 
[stars] used for numerical experiments [right]. 



First note that, even with the exact data, there are an infinite number of solutions 
which fits exactly the data. The following figure shows a few sets of these solutions. 




Fig. 7: a)p{k,l\y) 

h)p{k\y) c)p{k\y,T) 

d) p(r) and p(r) e) F{qi) and F{qi). 



CONCLUSIONS 



We discussed the different steps for a complete resolution of an inverse problem and 
focused on the choice of a basis function selection and the order of the model. An 
algorithm based on Bayesian estimation is proposed and tested on simulated data. 



REFERENCES 

1. J. L. Friar and J. W. Negele, Nucl. Phys. A 212, 93 (1973). 

2. B. Dreher et al, Nucl. Phys. A 235, 219 (1974). 

3. J. Heisenberg and H. P Blok, Ann. Rev. Nucl. Part. Sc. 33, 569 (1983). 

4. M. E. Grypeos, G. A. Lalazissis, S. E. Massen, and C. P. Panos, J. Phys. G 17, 1093 (1991). 

5. R. E. Kozak, Am. J. Phys. 59, 74 (1991). 

6. J. Baker-Jarvis, J. Math. Phys. 30, 302 (1989). 

7. J. Baker-Jarvis, M. Racine, and J. Alameddine, J. Math. Phys. 30, 1459 (1989). 

8. N. Canosa, H. G. Miller, A. Plastino and R. Rossignoli, Physica A220, 611 (1995). 

9. Buck and Macaulay, "Linear inversion by the method of maximum entropy," in Maximum Entropy 
and Bayesian Methods 89, (J. Skilling, ed.), Kluwer Academic Publishers, 1990. 

10. A. Mohammad-Djafari, "A full Bayesian approach for inverse problems," in Maximum Entropy and 
Bayesian Methods 95, (K. Hanson and R. Silver, ed.), Kluwer Academic Publishers, 1996. 

11. D.J.C. MacKay, "Hyperparameters: Optimize or integrate out?" in Maximum Entropy and Bayesian 
Methods 93, (G. Heidbreder, ed.), pp. 43-59, Kluwer Academic PubUshers, 1996. 

12. V.A. Macaulay and B. Buck, "A fresh look at model selection in inverse scattering," in Maximum 
Entropy and Bayesian Methods 94, (J. SkilUng and S. Sibisi ed.), Kluwer Academic PubUshers, 
1996. 

13. A. Mohammad-Djafari and J. Idler, "A scale invariant Bayesian method to solve linear inverse 
problems", pp. 121-134. in Maximum Entropy and Bayesian Methods 94, (G. Heidbreder, ed.), 
Kluwer Academic Publishers, 1996. 

14. A. Mohammad-Djafari and J. Idler, "Maximum entropy prior laws of images and estimation of their 
parameters," pp. 285-293. in Maximum Entropy and Bayesian Methods 90, (T. Grandy, ed.), Kluwer 
Academic Publishers, 1991. 



